Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Implement directory listing #35

Merged
merged 15 commits into from
Jun 10, 2024
Merged

Conversation

bbockelm
Copy link
Collaborator

Implements the ListObjectsV2 query against the S3 endpoint, allowing XRootD to interpret a directory-like structure in S3 as a XRootD directory.

This has been lightly tested against public S3 buckets using curl queries like this:

$ curl -k -H 'Depth: 1' -X PROPFIND https://f4hp7ql65f.local:1094/test/cells/muscle-ibm/endothelial-stromal-cells/ -d @$HOME/.config/xrootd/prop_query

where the query file is as follows:

$ cat ~/.config/xrootd/prop_query 
<d:propfind xmlns:d='DAV:'>
  <d:prop>
  <d:displayname/>
  <d:resourcetype/>
  <d:getcontentlength/>
  <d:getcontenttype/>
  <d:getetag/>
  <d:getlastmodified/>
</d:prop>

It is the first use of query parameters in the signature creation code -- that might need some attention. Also did some modest refactoring around how the requests are passed (mostly to avoid repeating all the configuration strings). Needs to have path-style URLs tested and simple GET/PUT of data to look for regressions.

@bbockelm bbockelm requested a review from jhiemstrawisc May 23, 2024 03:48
Copy link
Contributor

@github-actions github-actions bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Remaining comments which cannot be posted as a review comment to avoid GitHub Rate Limit

lint

src/stl_string_utils.cc|154|
src/stl_string_utils.cc|157|

src/HTTPCommands.cc Outdated Show resolved Hide resolved
src/S3AccessInfo.cc Outdated Show resolved Hide resolved
src/S3Commands.cc Outdated Show resolved Hide resolved
src/S3Commands.cc Outdated Show resolved Hide resolved
src/S3Commands.cc Outdated Show resolved Hide resolved
src/S3FileSystem.cc Outdated Show resolved Hide resolved
src/S3FileSystem.cc Outdated Show resolved Hide resolved
src/S3FileSystem.cc Outdated Show resolved Hide resolved
src/S3FileSystem.cc Outdated Show resolved Hide resolved
src/stl_string_utils.cc Outdated Show resolved Hide resolved
@bbockelm
Copy link
Collaborator Author

Forgot to mention -- here's the xrootd configuration in use:

s3.begin
s3.path_name        /test
s3.bucket_name      genome-browser
s3.service_name     s3.amazonaws.com
s3.region           us-east-1
s3.service_url      https://s3.us-east-1.amazonaws.com
s3.url_style        virtual
s3.end

src/stl_string_utils.cc Outdated Show resolved Hide resolved
src/stl_string_utils.cc Outdated Show resolved Hide resolved
src/stl_string_utils.cc Outdated Show resolved Hide resolved
src/S3Commands.cc Outdated Show resolved Hide resolved
src/S3Commands.cc Outdated Show resolved Hide resolved
src/S3Commands.cc Outdated Show resolved Hide resolved
src/S3Commands.cc Outdated Show resolved Hide resolved
src/S3Commands.cc Outdated Show resolved Hide resolved
src/S3Commands.cc Outdated Show resolved Hide resolved
src/S3Commands.cc Outdated Show resolved Hide resolved
src/S3Commands.cc Show resolved Hide resolved
src/S3Commands.cc Outdated Show resolved Hide resolved
src/S3Commands.cc Show resolved Hide resolved
src/S3Commands.cc Show resolved Hide resolved
src/S3Commands.cc Show resolved Hide resolved
src/S3Commands.cc Show resolved Hide resolved
src/S3Commands.cc Outdated Show resolved Hide resolved
src/S3Commands.cc Outdated Show resolved Hide resolved
src/S3Commands.cc Outdated Show resolved Hide resolved
src/S3Commands.cc Show resolved Hide resolved
src/S3Commands.cc Outdated Show resolved Hide resolved
src/S3Commands.cc Outdated Show resolved Hide resolved
src/S3Commands.cc Outdated Show resolved Hide resolved
src/S3Directory.cc Outdated Show resolved Hide resolved
src/S3Directory.cc Outdated Show resolved Hide resolved
src/S3Directory.cc Outdated Show resolved Hide resolved
src/S3Commands.cc Outdated Show resolved Hide resolved
src/S3Commands.cc Outdated Show resolved Hide resolved
src/S3Commands.cc Show resolved Hide resolved
src/S3Commands.cc Outdated Show resolved Hide resolved
src/S3Directory.cc Outdated Show resolved Hide resolved
src/S3Directory.cc Outdated Show resolved Hide resolved
src/S3Directory.cc Outdated Show resolved Hide resolved
src/S3Directory.cc Outdated Show resolved Hide resolved
src/S3Directory.cc Outdated Show resolved Hide resolved
src/S3Directory.cc Outdated Show resolved Hide resolved
Adds a new request type, list, and interprets the results as a
directory listing.
Copy link
Member

@jhiemstrawisc jhiemstrawisc left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Requests for a few comments in sections that were a bit dense. After merging, can you also open an issue for adding unit tests to some of the easier targets in this? Stuff like URL parsing should be easy enough to come back to shortly.

src/S3Commands.cc Outdated Show resolved Hide resolved
src/S3Commands.cc Show resolved Hide resolved
@jhiemstrawisc
Copy link
Member

Further testing shows this also breaks more significantly. With a minimal config of:

all.export  /
xrd.protocol http:8443 libXrdHttp.so
ofs.osslib /workspaces/pelican_xrootd_s3/xrootd-s3-http/build/libXrdS3.so
xrootd.async off

s3.url_style path
s3.begin
s3.path_name /aws-opendata
s3.service_name s3
s3.region us-east-1
s3.service_url https://s3.us-east-1.amazonaws.com
s3.end

The first GET test I usually run results in a segfault. From Curl:

curl -v http://`hostname`:8443/aws-opendata/noaa-wod-pds/MD5SUMS
*   Trying 172.17.0.4:8443...
* Connected to 0f6eebf9123e (172.17.0.4) port 8443 (#0)
> GET /aws-opendata/noaa-wod-pds/MD5SUMS HTTP/1.1
> Host: 0f6eebf9123e:8443
> User-Agent: curl/7.76.1
> Accept: */*
> 
* Empty reply from server
* Closing connection 0
curl: (52) Empty reply from server

And from the server:

s3_Stat: Stat'ing path /aws-opendata/noaa-wod-pds/MD5SUMS
240524 20:07:02 68098 s3_SendRequest: Sending HTTP request https://.s3.us-east-1.amazonaws.com/?delimiter=%2F&list-type=2&max-keys=1000&prefix=noaa-wod-pds%2FMD5SUMS
240524 20:07:02 68098 s3_Stat: Failed to stat path /aws-opendata/noaa-wod-pds/MD5SUMS; response code 0
240524 20:07:02 68098 ofs_stat: unknown.1:29@0f6eebf9123e Unable to locate /aws-opendata/noaa-wod-pds/MD5SUMS; input/output error
240524 20:07:02 68098 http_Req:  XrdHttpReq::Error
240524 20:07:02 68098 unknown.1:29@0f6eebf9123e http_Req: PostProcessHTTPReq req: 2 reqstate: 0 final_:False
240524 20:07:02 68098 unknown.1:29@0f6eebf9123e http_Req: PostProcessHTTPReq mapping Xrd error [3007] to status code [500]
240524 20:07:02 68098 unknown.1:29@0f6eebf9123e http_Protocol:  Process. lp:(nil) reqstate: 0
240524 20:07:02 68098 unknown.1:29@0f6eebf9123e http_Req: No checksum requested; skipping to request state 2
240524 20:07:02 68098 unknown.1:29@0f6eebf9123e http_Protocol: Process is exiting rc:0
240524 20:07:02 68098 unknown.1:29@0f6eebf9123e ofs_open: 0-600 (600) fn=/aws-opendata/noaa-wod-pds/MD5SUMS
240524 20:07:02 68098 s3_S3File::Open: Opening file /aws-opendata/noaa-wod-pds/MD5SUMS
Segmentation fault (core dumped)

Are the new URL queries supposed to be added on a basic GET like this? It also looks like the URL generated in s3_SendRequest isn't coming out correctly, because exporting an entire S3 endpoint requires not setting a bucket.

@bbockelm
Copy link
Collaborator Author

Yup - my testing was all for specifying a bucket in the s3.begin ... s3.end block. I can go back and test that with your reproducer.

Can you confirm that if you specify a single bucket it works though?

@jhiemstrawisc
Copy link
Member

jhiemstrawisc commented May 24, 2024

Hardcoding the bucket with the config:

s3.url_style path
s3.begin
s3.path_name /aws-opendata
s3.bucket_name noaa-wod-pds
s3.service_name s3
s3.region us-east-1
s3.service_url https://s3.us-east-1.amazonaws.com
s3.end

still produces a segfault when I curl -v http://$HOSTNAME:8443/aws-opendata/MD5SUMS. I'm also noticing that s3_SendRequest: Sending HTTP request https://noaa-wod-pds.s3.us-east-1.amazonaws.com/?delimiter=%2F&list-type=2&max-keys=1000&prefix=noaa-wod-pds%2FMD5SUMS is constructing a virtual bucket URL instead of the configured path-style url.

@rw2
Copy link
Collaborator

rw2 commented May 28, 2024

I'm back from vacation. Let me know if I can help with any of this, but I don't want to duplicate work.

@jhiemstrawisc
Copy link
Member

@rw2, definitely! Do you want to see if you can start stitching up some of the missing pieces to get this working? We have a lot of interest in this right now.

@rw2
Copy link
Collaborator

rw2 commented May 29, 2024

ok, working on it now. I'll ping you on slack if I have questions.

@bbockelm
Copy link
Collaborator Author

bbockelm commented Jun 4, 2024

@rw2 - any updates on this? I'd like to get the directory listing wrapped into this month's release, which nominally happens on Thursday.

@rw2
Copy link
Collaborator

rw2 commented Jun 4, 2024

Last Thu/Fri: Wouldn't compile for me, which was weird. There must be something different in our environments. Fixed compile. Reproduced it. Got rid of the segfault, but it was a symptom, so the file didn't download. Couldn't work on it yesterday, back at it again this morning.

src/S3File.cc Show resolved Hide resolved
bbockelm added 2 commits June 8, 2024 14:39
Includes:
- Simple refactoring to follow common C++ coding styles.
- Handling of various edge cases when the object is missing or empty.
- Fixing path-based URL mode.
- Fix bucket name parsing from path when a specific bucket is not
  configured.
- Fix segfaults when using continuation tokens.
src/S3Commands.cc Outdated Show resolved Hide resolved
src/S3Commands.cc Outdated Show resolved Hide resolved
src/S3Commands.cc Outdated Show resolved Hide resolved
src/S3Directory.cc Outdated Show resolved Hide resolved
src/S3Directory.cc Outdated Show resolved Hide resolved
src/S3FileSystem.cc Outdated Show resolved Hide resolved
src/S3FileSystem.cc Outdated Show resolved Hide resolved
src/S3FileSystem.cc Outdated Show resolved Hide resolved
src/S3FileSystem.cc Outdated Show resolved Hide resolved
src/S3FileSystem.cc Outdated Show resolved Hide resolved
@bbockelm bbockelm requested a review from jhiemstrawisc June 8, 2024 21:56
test/s3_tests.cc Outdated Show resolved Hide resolved
test/s3_tests.cc Outdated Show resolved Hide resolved
test/s3_tests.cc Outdated Show resolved Hide resolved
test/s3_tests.cc Outdated Show resolved Hide resolved
test/s3_tests.cc Outdated Show resolved Hide resolved
test/s3_tests.cc Outdated Show resolved Hide resolved
@bbockelm
Copy link
Collaborator Author

bbockelm commented Jun 9, 2024

@jhiemstrawisc -- ready for re-review!

Beyond fixing the original branch (and some patches from @rw2), this also adds unit tests for the listing functionality and pre-commit settings (so I stop getting caught by the clang-format linter)!

Copy link
Member

@jhiemstrawisc jhiemstrawisc left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good to me, and my hand tests all appear to pass.

@jhiemstrawisc jhiemstrawisc merged commit fb810b5 into PelicanPlatform:main Jun 10, 2024
6 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants